Semiconductor EU stocks

Overview

This project analyzes the stock markets of major European semiconductor companies. The goal of the project is to retrive financial data from yfinance and use it to forecast stock markets of the companies with time series analysis and machine learning. The results can be applied to trading and finacial decisionmaking. Note that this project itself does not provide such decisionmaking. It serves only as a general analysis and guideline.

Initially this project also featured an attempt at forecasting stock close prices with a hybrid LSTM-ARIMA model (inspired by this paper), but after many failed attempts, it was scrapped. The original paper doesn’t use the model for time series forecast, but instead for trend and buy/sell signal detection. There are many other examples of LSTM being used for stock data forecast, but for a very volatile market, it might not be the best fit.

Data cleansing

Code
import yfinance as yf
import random
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import seaborn as sns
import pandas as pd
import numpy as np
import plotly.graph_objects as go
import plotly.express as px
import time
from datetime import datetime, timedelta

tickers = [
    "ASML.AS", "NXPI", "IFX.DE", "BESI.AS",
    "NOD.OL", "MELE.BR", "AIXA.DE", "SMHN.DE", "AWEVF"
]
all_data = {}
yesterday = datetime.today() - timedelta(days=1)
yesterday_str = yesterday.strftime('%Y-%m-%d')

#Fetch data in relative time to get reliable results
for ticker in tickers:
    for attempt in range(3):
        try:
            stock = yf.Ticker(ticker)
            hist = stock.history(period="max", end=yesterday_str)
            if hist is None or hist.empty:
                display(f"No data for {ticker}, attempt {attempt+1}")
                time.sleep(2)
                continue
            all_data[ticker] = hist
            break
        except Exception as e:
            display(f"Error fetching {ticker}: {e}, attempt {attempt+1}")
            time.sleep(2)

#Check out ASML data as a test
if "ASML.AS" in all_data:
    display("ASML stocks tail")
    display(all_data["ASML.AS"].tail())
else:
    display("ASML.AS data not available")

#Clean and processed data for continuous time series
processed_data = {}

for ticker, df in all_data.items():
    if df.empty:
        continue
    df.index = df.index.tz_localize(None) 
    
    df_continuous = df.asfreq('D')
    
    cols_to_ffill = ['Open', 'High', 'Low', 'Close', 'Adj Close']
    existing_cols = [c for c in cols_to_ffill if c in df_continuous.columns]
    df_continuous[existing_cols] = df_continuous[existing_cols].ffill()
    
    if 'Volume' in df_continuous.columns:
        df_continuous['Volume'] = df_continuous['Volume'].fillna(0)
    
    processed_data[ticker] = df_continuous
'ASML stocks tail'
Open High Low Close Volume Dividends Stock Splits
Date
2026-02-09 00:00:00+01:00 1200.000000 1205.000000 1177.400024 1204.800049 456425 1.6 0.0
2026-02-10 00:00:00+01:00 1196.000000 1212.400024 1185.800049 1193.000000 459476 0.0 0.0
2026-02-11 00:00:00+01:00 1185.400024 1224.000000 1176.599976 1207.800049 530632 0.0 0.0
2026-02-12 00:00:00+01:00 1225.000000 1225.000000 1176.599976 1179.800049 558696 0.0 0.0
2026-02-13 00:00:00+01:00 1190.599976 1210.599976 1173.800049 1190.400024 708101 0.0 0.0

Line chart plot

After cleaning and processing the data, the next step is to visualize the stock markets in a clean line chart. Plotly offers some of the cleanest and most interactive visualization for this. There are downsides for using plotly however, the main ones being memory-heaviness and slowness. That is why it’s not recommended to use plotly for large data analytics.

Code
fig = go.Figure()
for ticker, data in processed_data.items():
    fig.add_trace(
        go.Scatter(
            x=data.index,
            y=data['Close'],
            mode='lines',
            name=f"{ticker} Close"
        )
    )
fig.update_layout(
    title="European Semiconductor Companies - Close Prices",
    xaxis_title="Time",
    yaxis_title="Close Price (€ or $ depending on listing)",
    legend_title="Company"
)

fig.show()
Figure 1: Time series line plot

Also line chart plot of last 500 days.

Code
fig = go.Figure()
for ticker, data in processed_data.items():
    data = data.tail(500)
    print(data['Close'].min(), data['Close'].max())
    fig.add_trace(
        go.Scatter(
            x=data.index,
            y=data['Close'],
            mode='lines',
            name=f"{ticker} Close"
        )
    )

fig.update_layout(
    title="European Semiconductor Companies - Close Prices of last 500 days",
    xaxis_title="Time",
    yaxis_title="Close Price (€ or $ depending on listing)",
    legend_title="Company"
)

fig.show()
545.1442260742188 1223.158447265625
151.41038513183594 249.75
24.344999313354492 43.5099983215332
77.80705261230469 176.1999969482422
92.66000366210938 169.10000610351562
41.524349212646484 74.73664093017578
9.099449157714844 22.770000457763672
24.040000915527344 69.95320892333984
1.100000023841858 2.9700000286102295
Figure 2: Time series line plot

MACD analysis

Next is the analysis of MACD. MACD (Moving Average Convergence Divergence) is a commonly used test in financial statistics and trading. It reveals general trends in the stocks for buying and selling. It’s a really important step in stock market analysis. It’s recommended to zoom in the plot to see the MACD results and candlestick plot better.

Code
from plotly.subplots import make_subplots

for ticker, data in all_data.items():
    
    data['EMA12'] = data['Close'].ewm(span=12, adjust=False).mean()

    
    data['EMA26'] = data['Close'].ewm(span=26, adjust=False).mean()

    
    data['MACD'] = data['EMA12'] - data['EMA26']

    
    data['Signal_Line'] = data['MACD'].ewm(span=9, adjust=False).mean()

    
    last_row = data.iloc[-1]
    second_last_row = data.iloc[-2]

    if second_last_row['MACD'] > second_last_row['Signal_Line'] and last_row['MACD'] < last_row['Signal_Line']:
        print(f'{ticker}: Cross Below Signal Line → Potential Bearish Signal')
    elif second_last_row['MACD'] < second_last_row['Signal_Line'] and last_row['MACD'] > last_row['Signal_Line']:
        print(f'{ticker}: Cross Above Signal Line → Potential Bullish Signal')

    #Print the market trends first
    else:
        
        if last_row['MACD'] > last_row['Signal_Line']:
            trend = 'Bullish Trend'
        elif last_row['MACD'] < last_row['Signal_Line']:
            trend = 'Bearish Trend'
        else:
            trend = 'Neutral / Flat'
        print(f'{ticker}: No Crossover → {trend}')

for ticker, data in all_data.items():
    fig = make_subplots(
        rows=2, cols=1,
        shared_xaxes=True,
        vertical_spacing=0.1,
        row_heights=[0.7, 0.3],
        subplot_titles=(f'{ticker} Price', 'MACD')
    )

    
    fig.add_trace(go.Candlestick(
        x=data.index,
        open=data['Open'],
        high=data['High'],
        low=data['Low'],
        close=data['Close'],
        name='Price'
    ), row=1, col=1)

    
    fig.add_trace(go.Scatter(
        x=data.index,
        y=data['MACD'],
        mode='lines',
        name='MACD',
        line=dict(color='green')
    ), row=2, col=1)

    
    fig.add_trace(go.Scatter(
        x=data.index,
        y=data['Signal_Line'],
        mode='lines',
        name='Signal Line',
        line=dict(color='red')
    ), row=2, col=1)

    
    macd_hist = data['MACD'] - data['Signal_Line']
    fig.add_trace(go.Bar(
        x=data.index,
        y=macd_hist,
        name='MACD Histogram',
        marker_color=['green' if val >= 0 else 'red' for val in macd_hist],
        opacity=0.6
    ), row=2, col=1)

    
    fig.update_layout(
        title=f'{ticker} Candlestick & MACD',
        xaxis_rangeslider_visible=False,
        legend=dict(x=0, y=1.15, orientation='h'),
        height=700
    )

    fig.show()
ASML.AS: No Crossover → Bearish Trend
NXPI: No Crossover → Bullish Trend
IFX.DE: Cross Above Signal Line → Potential Bullish Signal
BESI.AS: No Crossover → Bearish Trend
NOD.OL: No Crossover → Bullish Trend
MELE.BR: No Crossover → Bearish Trend
AIXA.DE: No Crossover → Bullish Trend
SMHN.DE: No Crossover → Bearish Trend
AWEVF: No Crossover → Bullish Trend
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
Figure 3

RSI analysis

The next technical indicator analysis is RSI (Relative strength index). The indicator helps to indentify the oveerbought and oversold trends and buy and sell signals. Using both RSI and MACD is the most optimal way to figure out stock market trends for trading.

Code
from ta.momentum import RSIIndicator

for ticker, data in all_data.items():
    close_values = data['Close']
     
    rsi_14 = RSIIndicator(close=close_values, window=14)
    rsi_series = rsi_14.rsi()
    
    fig = go.Figure()
    
    fig.add_trace(go.Scatter(
        x=close_values.index, 
        y=rsi_series, 
        mode='lines', 
        name=f'{ticker} RSI'
    ))
    
    fig.add_hline(y=70, line_dash="dash", line_color="red", annotation_text="Overbought")
    fig.add_hline(y=30, line_dash="dash", line_color="green", annotation_text="Oversold")
    
    fig.update_layout(
        title=f"RSI (14) for {ticker}",
        xaxis_title="Date",
        yaxis_title="RSI",
        yaxis=dict(range=[0, 100])
    )
    
    fig.show()
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
Figure 4

Q-Q-Plot

The Q-Q-plot is an important sanity check for the market data. From the Q-Q-plot you can tell if the data aligns with a standard probablity distribution. Straight line means aligning with the distribution.

Code
import statsmodels.api as sm
import matplotlib.pyplot as plt
from arch import arch_model
import datetime as dt
from scipy.stats import norm

for ticker, data in all_data.items():
    returns = 100 * data['Close'].pct_change().dropna()
    
    # Sort the sample
    sorted_returns = np.sort(returns)
    n = len(sorted_returns)
    
    # Compute theoretical quantiles from standard normal
    p = (np.arange(1, n+1) - 0.5) / n
    theoretical_quantiles = norm.ppf(p)
    
    # Reference line (45-degree line)
    ref_line = [theoretical_quantiles.min(), theoretical_quantiles.max()]
    
    # Plot with Plotly
    qq_fig = go.Figure()
    
    # Scatter points
    qq_fig.add_trace(go.Scatter(
        x=theoretical_quantiles,
        y=sorted_returns,
        mode='markers',
        name='Data'
    ))
    
    # 45-degree reference line
    qq_fig.add_trace(go.Scatter(
        x=ref_line,
        y=ref_line,
        mode='lines',
        line=dict(color='red', dash='dash'),
        name='Fit Line'
    ))
    
    qq_fig.update_layout(
        title=f'{ticker} Returns Q-Q Plot',
        xaxis_title='Theoretical Quantiles',
        yaxis_title='Sample Quantiles'
    )
    
    qq_fig.show()
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
Figure 5

GARCH model

The GARCH (Generalized Autoregressive Conditional Heteroscedasticity) model is a popular statistical model for time series analysis, especially in trading and quantitative finance. The main application of ARCH in finance is to examine and forecast the market volatility. This is especially important for volatile and risk-averse markets like semiconductor market.

Code
from arch import arch_model
split_date = dt.datetime(2026, 2, 11)

for ticker, data in all_data.items():
    returns = 100 * data['Close'].pct_change().dropna()
    
    # Fit GARCH(1,1)
    am = arch_model(returns, vol='Garch', p=1, q=1, dist='normal')
    res = am.fit(update_freq=5, disp='off', last_obs=split_date, options={'ftol': 1e-4})
    display(res.summary())
    
    # Fixed GARCH parameters
    fixed_res = am.fix([0.0235, 0.01, 0.06, 0.0])
    display("Fixed results:")
    display(fixed_res.summary())
    
    # Compare volatility estimates
    df_vol = pd.concat([res.conditional_volatility, fixed_res.conditional_volatility], axis=1)
    df_vol.columns = ["Estimated", "Fixed"]
    
    vol_fig = go.Figure()
    vol_fig.add_trace(go.Scatter(x=df_vol.index, y=df_vol["Estimated"], mode='lines', name='Estimated'))
    vol_fig.add_trace(go.Scatter(x=df_vol.index, y=df_vol["Fixed"], mode='lines', name='Fixed'))
    vol_fig.update_layout(
        title=f"{ticker}: Estimated vs Fixed Volatility",
        xaxis_title="Date",
        yaxis_title="Volatility"
    )
    vol_fig.show()
    
    # Forecasting
    forecasts = res.forecast(horizon=5, align='origin')
    display(forecasts.variance.dropna().head())
    
    forecast_var = forecasts.variance.iloc[-1]
    forecast_vol_annual = np.sqrt(forecast_var) * np.sqrt(252)
    
    cond_vol_annual = res.conditional_volatility * np.sqrt(252)
    realized_vol_annual = returns.rolling(window=5).std() * np.sqrt(252)
    
    forecast_dates = pd.date_range(start=split_date + pd.Timedelta(days=1), periods=5, freq='B')
    forecast_series = pd.Series(forecast_vol_annual.values, index=forecast_dates)
    
    # Plot forecast + realized + in-sample volatility
    plot_start = returns.index[max(0, len(returns)-100)]
    forecast_fig = go.Figure()
    forecast_fig.add_trace(go.Scatter(
        x=realized_vol_annual.loc[plot_start:].index,
        y=realized_vol_annual.loc[plot_start:],
        mode='lines',
        name='Realized Vol (5-day)',
        line=dict(color='gray'),
        opacity=0.3
    ))
    forecast_fig.add_trace(go.Scatter(
        x=cond_vol_annual.loc[plot_start:split_date].index,
        y=cond_vol_annual.loc[plot_start:split_date],
        mode='lines',
        name='In-Sample GARCH',
        line=dict(color='blue')
    ))
    forecast_fig.add_trace(go.Scatter(
        x=forecast_series.index,
        y=forecast_series.values,
        mode='lines',
        name='5-Day Out-of-Sample Forecast',
        line=dict(color='red')
    ))
    forecast_fig.add_vline(x=split_date, line=dict(color='black', dash='dash'))
    forecast_fig.update_layout(
        title=f'Volatility Forecast: {ticker}',
        yaxis_title='Annualized Volatility (%)',
        xaxis_title='Date'
    )
    forecast_fig.show()
Constant Mean - GARCH Model Results
Dep. Variable: Close R-squared: 0.000
Mean Model: Constant Mean Adj. R-squared: 0.000
Vol Model: GARCH Log-Likelihood: -18034.3
Distribution: Normal AIC: 36076.5
Method: Maximum Likelihood BIC: 36104.0
No. Observations: 7093
Date: Mon, Feb 16 2026 Df Residuals: 7092
Time: 15:21:10 Df Model: 1
Mean Model
coef std err t P>|t| 95.0% Conf. Int.
mu -8.2204e-03 2.835e-02 -0.290 0.772 [-6.378e-02,4.734e-02]
Volatility Model
coef std err t P>|t| 95.0% Conf. Int.
omega 0.0892 1.239 7.202e-02 0.943 [ -2.339, 2.517]
alpha[1] 0.0587 7.780e-02 0.754 0.451 [-9.382e-02, 0.211]
beta[1] 0.9413 0.256 3.678 2.354e-04 [ 0.440, 1.443]


Covariance estimator: robust
'Fixed results:'
Constant Mean - GARCH Model Results
Dep. Variable: Close R-squared: --
Mean Model: Constant Mean Adj. R-squared: --
Vol Model: GARCH Log-Likelihood: -2.68148e+06
Distribution: Normal AIC: 5.36296e+06
Method: User-specified Parameters BIC: 5.36299e+06
No. Observations: 7096
Date: Mon, Feb 16 2026
Time: 15:21:10
Mean Model
coef
mu 0.0235
Volatility Model
coef
omega 0.0100
alpha[1] 0.0600
beta[1] 0.0000


Results generated with user-specified parameters.
Std. errors not available when the model is not estimated,
h.1 h.2 h.3 h.4 h.5
Date
2026-02-10 8.343022 8.432228 8.521434 8.610640 8.699846
2026-02-11 8.034235 8.123442 8.212648 8.301854 8.391060
2026-02-12 7.965153 8.054359 8.143565 8.232772 8.321978
2026-02-13 7.635271 7.724478 7.813684 7.902890 7.992096
Constant Mean - GARCH Model Results
Dep. Variable: Close R-squared: 0.000
Mean Model: Constant Mean Adj. R-squared: 0.000
Vol Model: GARCH Log-Likelihood: -8841.72
Distribution: Normal AIC: 17691.4
Method: Maximum Likelihood BIC: 17716.5
No. Observations: 3901
Date: Mon, Feb 16 2026 Df Residuals: 3900
Time: 15:21:10 Df Model: 1
Mean Model
coef std err t P>|t| 95.0% Conf. Int.
mu 0.0996 2.612e-02 3.813 1.373e-04 [4.839e-02, 0.151]
Volatility Model
coef std err t P>|t| 95.0% Conf. Int.
omega 8.8983e-03 6.569e-03 1.355 0.176 [-3.977e-03,2.177e-02]
alpha[1] 0.0722 1.229e-02 5.881 4.083e-09 [4.817e-02,9.633e-02]
beta[1] 0.9278 1.428e-02 64.971 0.000 [ 0.900, 0.956]


Covariance estimator: robust
'Fixed results:'
Constant Mean - GARCH Model Results
Dep. Variable: Close R-squared: --
Mean Model: Constant Mean Adj. R-squared: --
Vol Model: GARCH Log-Likelihood: -243492.
Distribution: Normal AIC: 486991.
Method: User-specified Parameters BIC: 487016.
No. Observations: 3904
Date: Mon, Feb 16 2026
Time: 15:21:10
Mean Model
coef
mu 0.0235
Volatility Model
coef
omega 0.0100
alpha[1] 0.0600
beta[1] 0.0000


Results generated with user-specified parameters.
Std. errors not available when the model is not estimated,
h.1 h.2 h.3 h.4 h.5
Date
2026-02-10 6.911105 6.920091 6.929077 6.938063 6.947048
2026-02-11 8.566262 8.575268 8.584275 8.593281 8.602288
2026-02-12 8.662645 8.671653 8.680661 8.689668 8.698677
2026-02-13 8.095002 8.104003 8.113004 8.122004 8.131005
Constant Mean - GARCH Model Results
Dep. Variable: Close R-squared: 0.000
Mean Model: Constant Mean Adj. R-squared: 0.000
Vol Model: GARCH Log-Likelihood: -15452.4
Distribution: Normal AIC: 30912.8
Method: Maximum Likelihood BIC: 30940.0
No. Observations: 6621
Date: Mon, Feb 16 2026 Df Residuals: 6620
Time: 15:21:10 Df Model: 1
Mean Model
coef std err t P>|t| 95.0% Conf. Int.
mu 0.0881 2.783e-02 3.168 1.537e-03 [3.360e-02, 0.143]
Volatility Model
coef std err t P>|t| 95.0% Conf. Int.
omega 0.0664 2.557e-02 2.595 9.458e-03 [1.624e-02, 0.116]
alpha[1] 0.0540 1.119e-02 4.826 1.396e-06 [3.208e-02,7.596e-02]
beta[1] 0.9370 1.325e-02 70.708 0.000 [ 0.911, 0.963]


Covariance estimator: robust
'Fixed results:'
Constant Mean - GARCH Model Results
Dep. Variable: Close R-squared: --
Mean Model: Constant Mean Adj. R-squared: --
Vol Model: GARCH Log-Likelihood: -502675.
Distribution: Normal AIC: 1.00536e+06
Method: User-specified Parameters BIC: 1.00539e+06
No. Observations: 6624
Date: Mon, Feb 16 2026
Time: 15:21:10
Mean Model
coef
mu 0.0235
Volatility Model
coef
omega 0.0100
alpha[1] 0.0600
beta[1] 0.0000


Results generated with user-specified parameters.
Std. errors not available when the model is not estimated,
h.1 h.2 h.3 h.4 h.5
Date
2026-02-10 4.625270 4.650100 4.674707 4.699094 4.723261
2026-02-11 4.508649 4.534527 4.560172 4.585586 4.610773
2026-02-12 4.472842 4.499041 4.525004 4.550735 4.576234
2026-02-13 4.411641 4.438389 4.464897 4.491167 4.517201
Constant Mean - GARCH Model Results
Dep. Variable: Close R-squared: 0.000
Mean Model: Constant Mean Adj. R-squared: 0.000
Vol Model: GARCH Log-Likelihood: -16940.8
Distribution: Normal AIC: 33889.6
Method: Maximum Likelihood BIC: 33917.1
No. Observations: 7093
Date: Mon, Feb 16 2026 Df Residuals: 7092
Time: 15:21:11 Df Model: 1
Mean Model
coef std err t P>|t| 95.0% Conf. Int.
mu 0.1554 2.890e-02 5.377 7.583e-08 [9.874e-02, 0.212]
Volatility Model
coef std err t P>|t| 95.0% Conf. Int.
omega 0.0700 2.654e-02 2.640 8.300e-03 [1.804e-02, 0.122]
alpha[1] 0.0483 9.928e-03 4.865 1.146e-06 [2.884e-02,6.775e-02]
beta[1] 0.9446 1.167e-02 80.912 0.000 [ 0.922, 0.967]


Covariance estimator: robust
'Fixed results:'
Constant Mean - GARCH Model Results
Dep. Variable: Close R-squared: --
Mean Model: Constant Mean Adj. R-squared: --
Vol Model: GARCH Log-Likelihood: -578098.
Distribution: Normal AIC: 1.15620e+06
Method: User-specified Parameters BIC: 1.15623e+06
No. Observations: 7096
Date: Mon, Feb 16 2026
Time: 15:21:11
Mean Model
coef
mu 0.0235
Volatility Model
coef
omega 0.0100
alpha[1] 0.0600
beta[1] 0.0000


Results generated with user-specified parameters.
Std. errors not available when the model is not estimated,
h.1 h.2 h.3 h.4 h.5
Date
2026-02-10 8.258730 8.270010 8.281211 8.292331 8.303373
2026-02-11 7.913730 7.927465 7.941103 7.954643 7.968088
2026-02-12 7.645786 7.661428 7.676958 7.692378 7.707689
2026-02-13 8.181165 8.192997 8.204746 8.216410 8.227992
Constant Mean - GARCH Model Results
Dep. Variable: Close R-squared: 0.000
Mean Model: Constant Mean Adj. R-squared: 0.000
Vol Model: GARCH Log-Likelihood: -17462.0
Distribution: Normal AIC: 34931.9
Method: Maximum Likelihood BIC: 34959.1
No. Observations: 6620
Date: Mon, Feb 16 2026 Df Residuals: 6619
Time: 15:21:11 Df Model: 1
Mean Model
coef std err t P>|t| 95.0% Conf. Int.
mu 0.1177 4.077e-02 2.887 3.892e-03 [3.779e-02, 0.198]
Volatility Model
coef std err t P>|t| 95.0% Conf. Int.
omega 1.1929 0.892 1.337 0.181 [ -0.556, 2.941]
alpha[1] 0.0587 3.394e-02 1.730 8.364e-02 [-7.806e-03, 0.125]
beta[1] 0.8456 9.160e-02 9.231 2.677e-20 [ 0.666, 1.025]


Covariance estimator: robust
'Fixed results:'
Constant Mean - GARCH Model Results
Dep. Variable: Close R-squared: --
Mean Model: Constant Mean Adj. R-squared: --
Vol Model: GARCH Log-Likelihood: -2.62349e+06
Distribution: Normal AIC: 5.24699e+06
Method: User-specified Parameters BIC: 5.24702e+06
No. Observations: 6623
Date: Mon, Feb 16 2026
Time: 15:21:11
Mean Model
coef
mu 0.0235
Volatility Model
coef
omega 0.0100
alpha[1] 0.0600
beta[1] 0.0000


Results generated with user-specified parameters.
Std. errors not available when the model is not estimated,
h.1 h.2 h.3 h.4 h.5
Date
2026-02-10 16.765399 16.353629 15.981271 15.644552 15.340061
2026-02-11 15.477566 15.189058 14.928163 14.692240 14.478897
2026-02-12 14.478288 14.285423 14.111017 13.953305 13.810687
2026-02-13 14.618437 14.412158 14.225622 14.056940 13.904403
Constant Mean - GARCH Model Results
Dep. Variable: Close R-squared: 0.000
Mean Model: Constant Mean Adj. R-squared: 0.000
Vol Model: GARCH Log-Likelihood: -12943.7
Distribution: Normal AIC: 25895.3
Method: Maximum Likelihood BIC: 25922.2
No. Observations: 6091
Date: Mon, Feb 16 2026 Df Residuals: 6090
Time: 15:21:11 Df Model: 1
Mean Model
coef std err t P>|t| 95.0% Conf. Int.
mu 0.1293 2.716e-02 4.761 1.931e-06 [7.607e-02, 0.183]
Volatility Model
coef std err t P>|t| 95.0% Conf. Int.
omega 0.1547 8.297e-02 1.864 6.234e-02 [-7.971e-03, 0.317]
alpha[1] 0.0836 2.840e-02 2.942 3.262e-03 [2.789e-02, 0.139]
beta[1] 0.8868 4.154e-02 21.347 4.167e-101 [ 0.805, 0.968]


Covariance estimator: robust
'Fixed results:'
Constant Mean - GARCH Model Results
Dep. Variable: Close R-squared: --
Mean Model: Constant Mean Adj. R-squared: --
Vol Model: GARCH Log-Likelihood: -293219.
Distribution: Normal AIC: 586446.
Method: User-specified Parameters BIC: 586473.
No. Observations: 6094
Date: Mon, Feb 16 2026
Time: 15:21:11
Mean Model
coef
mu 0.0235
Volatility Model
coef
omega 0.0100
alpha[1] 0.0600
beta[1] 0.0000


Results generated with user-specified parameters.
Std. errors not available when the model is not estimated,
h.1 h.2 h.3 h.4 h.5
Date
2026-02-10 11.456505 11.271090 11.091178 10.916607 10.747218
2026-02-11 10.554557 10.395914 10.241981 10.092617 9.947687
2026-02-12 9.737062 9.602686 9.472299 9.345782 9.223020
2026-02-13 8.818160 8.711060 8.607140 8.506304 8.408461
Constant Mean - GARCH Model Results
Dep. Variable: Close R-squared: 0.000
Mean Model: Constant Mean Adj. R-squared: 0.000
Vol Model: GARCH Log-Likelihood: -18540.8
Distribution: Normal AIC: 37089.6
Method: Maximum Likelihood BIC: 37117.0
No. Observations: 6971
Date: Mon, Feb 16 2026 Df Residuals: 6970
Time: 15:21:11 Df Model: 1
Mean Model
coef std err t P>|t| 95.0% Conf. Int.
mu 0.0638 4.071e-02 1.567 0.117 [-1.598e-02, 0.144]
Volatility Model
coef std err t P>|t| 95.0% Conf. Int.
omega 0.0851 6.370e-02 1.336 0.182 [-3.974e-02, 0.210]
alpha[1] 0.0276 1.085e-02 2.547 1.087e-02 [6.372e-03,4.892e-02]
beta[1] 0.9669 1.469e-02 65.836 0.000 [ 0.938, 0.996]


Covariance estimator: robust
'Fixed results:'
Constant Mean - GARCH Model Results
Dep. Variable: Close R-squared: --
Mean Model: Constant Mean Adj. R-squared: --
Vol Model: GARCH Log-Likelihood: -730737.
Distribution: Normal AIC: 1.46148e+06
Method: User-specified Parameters BIC: 1.46151e+06
No. Observations: 6974
Date: Mon, Feb 16 2026
Time: 15:21:11
Mean Model
coef
mu 0.0235
Volatility Model
coef
omega 0.0100
alpha[1] 0.0600
beta[1] 0.0000


Results generated with user-specified parameters.
Std. errors not available when the model is not estimated,
h.1 h.2 h.3 h.4 h.5
Date
2026-02-10 15.372898 15.374674 15.376440 15.378196 15.379943
2026-02-11 15.007516 15.011272 15.015008 15.018723 15.022419
2026-02-12 14.901144 14.905477 14.909787 14.914073 14.918335
2026-02-13 15.621601 15.622028 15.622453 15.622876 15.623296
Constant Mean - GARCH Model Results
Dep. Variable: Close R-squared: 0.000
Mean Model: Constant Mean Adj. R-squared: 0.000
Vol Model: GARCH Log-Likelihood: -18235.1
Distribution: Normal AIC: 36478.1
Method: Maximum Likelihood BIC: 36505.5
No. Observations: 6834
Date: Mon, Feb 16 2026 Df Residuals: 6833
Time: 15:21:11 Df Model: 1
Mean Model
coef std err t P>|t| 95.0% Conf. Int.
mu 0.1177 4.038e-02 2.913 3.575e-03 [3.850e-02, 0.197]
Volatility Model
coef std err t P>|t| 95.0% Conf. Int.
omega 0.1294 9.690e-02 1.335 0.182 [-6.054e-02, 0.319]
alpha[1] 0.0407 1.935e-02 2.105 3.533e-02 [2.798e-03,7.865e-02]
beta[1] 0.9509 2.491e-02 38.167 0.000 [ 0.902, 1.000]


Covariance estimator: robust
'Fixed results:'
Constant Mean - GARCH Model Results
Dep. Variable: Close R-squared: --
Mean Model: Constant Mean Adj. R-squared: --
Vol Model: GARCH Log-Likelihood: -683725.
Distribution: Normal AIC: 1.36746e+06
Method: User-specified Parameters BIC: 1.36749e+06
No. Observations: 6837
Date: Mon, Feb 16 2026
Time: 15:21:11
Mean Model
coef
mu 0.0235
Volatility Model
coef
omega 0.0100
alpha[1] 0.0600
beta[1] 0.0000


Results generated with user-specified parameters.
Std. errors not available when the model is not estimated,
h.1 h.2 h.3 h.4 h.5
Date
2026-02-10 15.527731 15.526324 15.524929 15.523545 15.522174
2026-02-11 14.966846 14.970164 14.973453 14.976714 14.979949
2026-02-12 15.737398 15.734225 15.731079 15.727959 15.724866
2026-02-13 15.601639 15.599609 15.597597 15.595602 15.593623
Constant Mean - GARCH Model Results
Dep. Variable: Close R-squared: 0.000
Mean Model: Constant Mean Adj. R-squared: 0.000
Vol Model: GARCH Log-Likelihood: -1144.35
Distribution: Normal AIC: 2296.69
Method: Maximum Likelihood BIC: 2312.37
No. Observations: 372
Date: Mon, Feb 16 2026 Df Residuals: 371
Time: 15:21:11 Df Model: 1
Mean Model
coef std err t P>|t| 95.0% Conf. Int.
mu 0.4701 4.393 0.107 0.915 [ -8.141, 9.081]
Volatility Model
coef std err t P>|t| 95.0% Conf. Int.
omega 3.6103 203.967 1.770e-02 0.986 [-3.962e+02,4.034e+02]
alpha[1] 0.3482 7.219 4.823e-02 0.962 [-13.801, 14.497]
beta[1] 0.6518 4.283 0.152 0.879 [ -7.742, 9.046]


Covariance estimator: robust
'Fixed results:'
Constant Mean - GARCH Model Results
Dep. Variable: Close R-squared: --
Mean Model: Constant Mean Adj. R-squared: --
Vol Model: GARCH Log-Likelihood: -125551.
Distribution: Normal AIC: 251110.
Method: User-specified Parameters BIC: 251126.
No. Observations: 375
Date: Mon, Feb 16 2026
Time: 15:21:11
Mean Model
coef
mu 0.0235
Volatility Model
coef
omega 0.0100
alpha[1] 0.0600
beta[1] 0.0000


Results generated with user-specified parameters.
Std. errors not available when the model is not estimated,
h.1 h.2 h.3 h.4 h.5
Date
2026-02-10 10.592766 14.203046 17.813326 21.423607 25.033887
2026-02-11 10.591889 14.202169 17.812450 21.422730 25.033010
2026-02-12 10.591318 14.201598 17.811878 21.422159 25.032439
2026-02-13 10.590945 14.201226 17.811506 21.421786 25.032066

Statistical checks

The next analysis checks the skewness and mode of the stock market data among other statistical measures. These are important for detailed understanding of the stock markets. Some analysts have debated that positive skewness is a good indicator for buying. The other statistical measures such as mean and standard deviation are also important for stock market analysis and can help with buy/sell decisions.

Code
from scipy.stats import skew, mode, gaussian_kde

for ticker, data in all_data.items():
    close_values = data['Close']
    
    # Stats
    close_mean = np.mean(close_values)
    close_median = np.median(close_values)
    mode_val = close_values.mode().iloc[0]
    
    # Use Seaborn's KDE to get values
    kde = sns.kdeplot(close_values)  # create a temporary plot
    kde_data = kde.get_lines()[0].get_data()  # extract x, y values
    x_range, y_values = kde_data
    kde.figure.clf()  # clear the temporary Seaborn figure
    
    # Create Plotly figure
    fig = go.Figure()
    
    # KDE line
    fig.add_trace(go.Scatter(
        x=x_range,
        y=y_values,
        mode='lines',
        name='KDE',
        line=dict(color='blue')
    ))
    
    # Vertical lines
    fig.add_trace(go.Scatter(
        x=[close_mean, close_mean],
        y=[0, max(y_values)],
        mode='lines',
        line=dict(color='orange', dash='dash'),
        name='Mean'
    ))
    fig.add_trace(go.Scatter(
        x=[close_median, close_median],
        y=[0, max(y_values)],
        mode='lines',
        line=dict(color='black', dash='dash'),
        name='Median'
    ))
    fig.add_trace(go.Scatter(
        x=[mode_val, mode_val],
        y=[0, max(y_values)],
        mode='lines',
        line=dict(color='green', dash='dash'),
        name='Mode'
    ))
    
    # Layout
    fig.update_layout(
        title=f"Distribution of {ticker} Close Prices (Skewness)",
        xaxis_title="Price",
        yaxis_title="Density",
        width=800,
        height=500,
        template='plotly_white',
        legend=dict(title="Statistics")
    )
    
    fig.show()
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
<Figure size 672x480 with 0 Axes>
(j)
Figure 6

XGBoost

XGBoost is the first of the ML models used in this project. XGBoost is one of the most popular gradient boosting implementations and fits expectionally well when analyzing time series data. XGBoost is a quite complicated model, so it’s easier to understand the results rather than the model itself. The time series line plot for these models includes only 200 days of historical data for easier visualization.

Code
import xgboost as xgb
from sklearn.metrics import mean_squared_error

colors = px.colors.qualitative.Alphabet

#First it's important to go through the data and separate each feature for training
def create_features(df, label=None):
    df = df.copy()
    df['date'] = df.index
    df['date'] = pd.to_datetime(df['date'])
    df['hour'] = df['date'].dt.hour
    df['dayofweek'] = df['date'].dt.dayofweek
    df['quarter'] = df['date'].dt.quarter
    df['month'] = df['date'].dt.month
    df['year'] = df['date'].dt.year
    df['dayofyear'] = df['date'].dt.dayofyear
    df['dayofmonth'] = df['date'].dt.day
    df['weekofyear'] = df['date'].dt.isocalendar().week

    X = df[['hour','dayofweek','quarter','month','year',
           'dayofyear','dayofmonth','weekofyear']]
    if label:
        y = df[label]
        return X, y
    return X

fig = go.Figure()
for i, (ticker, data) in enumerate(processed_data.items()):
    current_color = colors[i % len(colors)]
    data = data.sort_index()
    split_date = '10-Feb-2026' 
    stock_train = data.loc[data.index <= split_date].copy()
    stock_test = data.loc[data.index > split_date].copy()

    X_train, y_train = create_features(stock_train, label='Close')
    X_test, y_test = create_features(stock_test, label='Close')

    reg = xgb.XGBRegressor(n_estimators=1000, early_stopping_rounds=50)
    reg.fit(X_train, y_train,
            eval_set=[(X_train, y_train), (X_test, y_test)], verbose=False)

    forecast_periods = 50
    
    data_recent = data.tail(500).copy()
    data_recent.index = pd.to_datetime(data_recent.index)
    data_recent = data_recent.sort_index()

    hist_x = data_recent.index
    future_start = hist_x[-1] + pd.Timedelta(days=1)
    future_dates = pd.date_range(start=future_start, periods=forecast_periods, freq='B')
    future_df = pd.DataFrame(index=future_dates)
    X_future = create_features(future_df)

    forecast = reg.predict(X_future)

    last_hist_date = data_recent.index[-1]
    last_hist_close = data_recent['Close'].iloc[-1]

    plot_forecast_dates = pd.Index([last_hist_date]).append(future_dates)
    plot_forecast_values = np.concatenate(([last_hist_close], forecast))

    fig.add_trace(go.Scatter(
        x=data_recent.index,
        y=data_recent['Close'],
        mode='lines',
        name=f'Historical Market Close of {ticker}',
        line=dict(color=current_color)
    ))

    fig.add_trace(go.Scatter(
        x=plot_forecast_dates,
        y=plot_forecast_values,
        mode='lines',
        name=f'Predicted Future Close of {ticker}',
        line=dict(color=current_color, dash='dash')
    ))
    
fig.update_layout(
    title=f'Stock Close Price vs XGBoost Prediction',
    xaxis_title='Date',
    yaxis_title='Price',
    template='plotly_white'
)

fig.show()
Figure 7

Results

This section is an overview of the results from the previous analysis and forecasts. Because the markets are extremely volatile and many of the stocks, mostly notably ASML, have been skyrocketing in value lately, making forecasts is difficult, as some of the stocks have already been expected to fall according to most recent data.

From the first line chart Figure 1 and the market trend analysis you can see which companies have the strongest trends. ASML has been performing expectionally, but their stock value is experiencing a significant decrease. The other companies have similarly volatile stocks.

From the MACD Figure 3 and RSI Figure 4 indicator analysis it’s easy to see that the markets are very volatile. THe RSI plots vary heavily between oversold and overbought for most of the companies. This makes stock market analysis especially difficult and markets heavily exposed to speculation. One of the most ‘stable’ markets is that of AWEVF, partly due to it being a new company.

From the GARCH model Figure 5 and related statistical analysis, you can determine the most importand trends and qualities when it comes to market volatility. In this case, the most important plots to look at are the ‘estimated vs fixed volatility’ and forecast plots (remember to scroll to left to see the forecast). From these plots you can make decisions for risk management, banking regulations, and derivative management.

The statistical checks are somewhat optional but measures such as skewness from Figure 6 can provide significant details about the stock markets.

Finally the project the time series forecast model, XGboost Figure 7. As you can see from the graphs, the xgboost gives quite realistic forecast on the market close values. However for some of the tickers, you can see how the XGboost model might be too primitive and produce unrealistic forecasts.

The goal of this project has been to provide a wide variety of tools and models for stock market analysis and forecasting that can be applied to trading, investing, portfolio management, etc. The models should not be treated as firmly accurate, but instead as experimental models.